Search CORE

43 research outputs found

Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data

Author: Kasif Simon
Kolaczyk Eric D.
Nariai Naoki
Publication venue: Public Library of Science
Publication date: 01/03/2007
Field of study

Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function

Public Library of Science (PLOS)

Boston University Institutional Repository (OpenBU)

Directory of Open Access Journals

PubMed Central

Integration of relational and hierarchical network information for protein function prediction

Author: Jiang Xiaoyu
Kasif Simon
Kolaczyk Eric D
Nariai Naoki
Steffen Martin
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions. Results We propose a probabilistic framework to integrate information in relational data, in the form of a protein-protein interaction network, and a hierarchically structured database of terms, in the form of the GO database, for the purpose of protein function prediction. At the heart of our framework is a factorization of local neighborhood information in the protein-protein interaction network across successive ancestral terms in the GO hierarchy. We introduce a classifier within this framework, with computationally efficient implementation, that produces GO-term predictions that naturally obey a hierarchical 'true-path' consistency from root to leaves, without the need for further post-processing. Conclusion A cross-validation study, using data from the yeast <it>Saccharomyces cerevisiae</it>, shows our method offers substantial improvements over both standard 'guilt-by-association' (i.e., Nearest-Neighbor) and more refined Markov random field methods, whether in their original form or when post-processed to artificially impose 'true-path' consistency. Further analysis of the results indicates that these improvements are associated with increased predictive capabilities (i.e., increased positive predictive value), and that this increase is consistent uniformly with GO-term depth. Additional <it>in silico </it>validation on a collection of new annotations recently added to GO confirms the advantages suggested by the cross-validation study. Taken as a whole, our results show that a hierarchical approach to network-based protein function prediction, that exploits the ontological structure of protein annotation databases in a principled manner, can offer substantial advantages over the successive application of 'flat' network-based methods.</p

Crossref

Boston University Institutional Repository (OpenBU)

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing

Author: Kaname Kojima
Mamoru Takahashi
Masao Nagasaki
Naoki Nariai
Takahiro Mimori
Yosuke Kawai
Yukuto Sato
Yumi Yamaguchi-Kabata
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Springer - Publisher Connector

HLA-VBSeq: accurate HLA typing at full resolution from whole-genome sequencing data

Author: C Marks
D Comas
E Dawson
E Major
H Li
H Li
H Li
HA Elsner
JE Levine
Jun Yasuda
K Hosomichi
K Matsuki
Kaname Kojima
M Carrington
Masao Nagasaki
MI Jordan
MR Lincoln
N Nariai
Naoki Nariai
R Horton
RL Erlich
RL Warren
S Boegel
Sakae Saito
T Shiina
Takahiro Mimori
Y Bai
Y Morishima
Yosuke Kawai
Yukuto Sato
Yumi Yamaguchi-Kabata
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recommended from our members

Type 1 diabetes risk genes mediate pancreatic beta cell survival in response to proinflammatory cytokines

Author: Aylward Anthony
Beebe Elisha
Benaglio Paola
Chiou Joshua
Corban Sierra
Donovan Margaret K.R.
Elgamal Ruth
Frazer Kelly A.
Gaulton Kyle J.
Kaur Jaspreet
Korgaonkar Katha
Miller Michael
Nariai Naoki
Newsome Jacklyn
Okino Mei Lin
Preissl Sebastian
Qiu Yunjiang
Ren Bing
Sander Maike
Taipale Jussi
Wang Gaowei
Yan Jian
Zhu Han
Publication venue
Publication date: 01/12/2022
Field of study

Publisher Copyright: © 2022We combined functional genomics and human genetics to investigate processes that affect type 1 diabetes (T1D) risk by mediating beta cell survival in response to proinflammatory cytokines. We mapped 38,931 cytokine-responsive candidate cis-regulatory elements (cCREs) in beta cells using ATAC-seq and snATAC-seq and linked them to target genes using co-accessibility and HiChIP. Using a genome-wide CRISPR screen in EndoC-βH1 cells, we identified 867 genes affecting cytokine-induced survival, and genes promoting survival and up-regulated in cytokines were enriched at T1D risk loci. Using SNP-SELEX, we identified 2,229 variants in cytokine-responsive cCREs altering transcription factor (TF) binding, and variants altering binding of TFs regulating stress, inflammation, and apoptosis were enriched for T1D risk. At the 16p13 locus, a fine-mapped T1D variant altering TF binding in a cytokine-induced cCRE interacted with SOCS1, which promoted survival in cytokine exposure. Our findings reveal processes and genes acting in beta cells during inflammation that modulate T1D risk.Peer reviewe

eScholarship - University of California

Helsingin yliopiston digitaalinen arkisto

MDC Repository

iPSCORE: A Resource of 222 iPSC Lines Enabling Functional Characterization of Genetic Variation across a Variety of Cell Types.

Author: Adler Eric
Aguirre Aitor
Arias Angelo D
Benaglio Paola
Berggren W Travis
Borja Victor
Chi Neil C
Cook Megan
D'Antonio Matteo
D'Antonio-Chronowska Agnieszka
Dargitz Carl T
DeBoever Christopher
Diffenderfer Kenneth E
Donovan Margaret KR
Drees Frauke
Evans Sylvia M
Farnam KathyJean
Feiring Rachel
Frazer Kelly A
Garcia Melvin
Goldstein Lawrence SB
Greenwald William W
Grinstein Jonathan D
Harismendy Olivier
Hashem Sherin I
Hishida Yuriko
Izpisua Belmonte Juan Carlos
Jakubosky David A
Jepsen Kristen
Li He
Loring Jeanne F
Matsui Hiroko
McGarry Thomas J
Miller Carl A
Modesto Veronica
Müller Franz-Josef
Nariai Naoki
Nelson Bradley C
O'Connor Daniel T
Okubo Jonathan
Panopoulos Athanasia D
Rao Fangwen
Reyna Joaquin
Schuldt Bernhard M
Smith Erin N
Williams Roy
Yeo Gene W
Zhao Chang
Publication venue: eScholarship, University of California
Publication date: 01/04/2017
Field of study

Large-scale collections of induced pluripotent stem cells (iPSCs) could serve as powerful model systems for examining how genetic variation affects biology and disease. Here we describe the iPSCORE resource: a collection of systematically derived and characterized iPSC lines from 222 ethnically diverse individuals that allows for both familial and association-based genetic studies. iPSCORE lines are pluripotent with high genomic integrity (no or low numbers of somatic copy-number variants) as determined using high-throughput RNA-sequencing and genotyping arrays, respectively. Using iPSCs from a family of individuals, we show that iPSC-derived cardiomyocytes demonstrate gene expression patterns that cluster by genetic background, and can be used to examine variants associated with physiological and disease phenotypes. The iPSCORE collection contains representative individuals for risk and non-risk alleles for 95% of SNPs associated with human phenotypes through genome-wide association studies. Our study demonstrates the utility of iPSCORE for examining how genetic variants influence molecular and physiological traits in iPSCs and derived cell lines

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Current Performance and On-Going Improvements of the 8.2 m Subaru Telescope

Author: Ando Hiroyasu
Aoki Kentaro
Aoki Wako
Chikada Yoshihiro
Doi Yoshiyuki
Ebizuka Noboru
Elms Brian
Fujihara Gary
Furusawa Hisanori
Fuse Tetsuharu
Gaessler Wolfgang
Harasawa Sumiko
Hayano Yutaka
Hayashi Masahiko
Hayashi Saeko
Ichikawa Shinichi
Imanishi Masatoshi
Ishida Catherine
Iye Masanori
Kaifu Norio
Kamata Yukiko
Kanzawa Tomio
Karoji Hiroshi
Kashikawa Nobunari
Kawabata Koji
Kobayashi Naoto
Kodaira Keiichi
Komiyama Yutaka
Kosugi George
Kurakami Tomio
Letawsky Michael
Mikami Yoshitaka
Miyashita Akihiko
Miyazaki Satoshi
Mizumoto Yoshihiko
Morino Junichi
Motohara Kentaro
Murakawa Koji
Nakagiri Masao
Nakamura Kyoko
Nakaya Hidehiko
Nariai Kyoji
Nishimura Tetsuo
Noguchi Kunio
Noguchi Takeshi
Noumaru Takeshi
Ogasawara Ryusuke
Ohshima Norio
Ohyama Yoichi
Okita Kiichi
Omata Koji
Otsubo Masashi
Oya Shin
Potter Robert
Saito Yoshihiko
Sasaki Toshiyuki
Sato Shuji
Scarla Dennis
Schubert Kiaina
Sekiguchi Kazuhiro
Sekiguchi Maki
Shelton Ian
Simpson Chris
Suto Hiroshi
Tajitsu Akito
Takami Hideki
Takata Tadafumi
Takato Naruhisa
Tamae Richard
Tamura Motohide
Tanaka Wataru
Terada Hiroshi
Torii Yasuo
Uraguchi Fumihiko
Usuda Tomonori
Weber Mark
Winegar Tom
Yagi Masafumi
Yamada Toru
Yamashita Takuya
Yamashita Yasumasa
Yasuda Naoki
Yoshida Michitoshi
Yutani Masami
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2004
Field of study

An overview of the current status of the 8.2 m Subaru Telescope constructed and operated at Mauna Kea, Hawaii, by the National Astronomical Observatory of Japan is presented. The basic design concept and the verified performance of the telescope system are described. Also given are the status of the instrument package offered to the astronomical community, the status of operation, and some of the future plans. The status of the telescope reported in a number of SPIE papers as of the summer of 2002 are incorporated with some updates included as of 2004 February. However, readers are encouraged to check the most updated status of the telescope through the home page, http://subarutelescope.org/index.html, and/or the direct contact with the observatory staff.Comment: 18 pages (17 pages in published version), 29 figures (GIF format), This is the version before the galley proo

arXiv.org e-Print Archive

CERN Document Server

Clustering of Lyman Break Galaxies at z=4 and 5 in The Subaru Deep Field: Luminosity Dependence of The Correlation Function Slope

We explored the clustering properties of Lyman Break Galaxies (LBGs) at z=4 and 5 with an angular two-point correlation function on the basis of the very deep and wide Subaru Deep Field data. We found an apparent dependence of the correlation function slope on UV luminosity for LBGs at both z=4 and 5. More luminous LBGs have a steeper correlation function. To compare these observational results, we constructed numerical mock LBG catalogs based on a semianalytic model of hierarchical clustering combined with high-resolution N-body simulation, carefully mimicking the observational selection effects. The luminosity functions for LBGs predicted by this mock catalog were found to be almost consistent with the observation. Moreover, the overall correlation functions of LBGs were reproduced reasonably well. The observed dependence of the clustering on UV luminosity was not reproduced by the model, unless subsamples of distinct halo mass were considered. That is, LBGs belonging to more massive dark haloes had steeper and larger-amplitude correlation functions. With this model, we found that LBG multiplicity in massive dark halos amplifies the clustering strength at small scales, which steepens the slope of the correlation function. The hierarchical clustering model could therefore be reconciled with the observed luminosity-dependence of the angular correlation function, if there is a tight correlation between UV luminosity and halo mass. Our finding that the slope of the correlation function depends on luminosity could be an indication that massive dark halos hosted multiple bright LBGs (abridged).Comment: 16 pages, 17 figures, Accepted for publication in ApJ, Full resolution version is available at http://zone.mtk.nao.ac.jp/~kashik/sdf/acf/sdf_lbgacf.pd

arXiv.org e-Print Archive

Crossref

A crowdsourced set of curated structural variants for the human genome.

Author: Ahmed Azza E
Alexander Noah
Blankenberg Daniel
Brueffer Christian
Carroll Andrew
Chapman Lesley M
Clarke Wayne E
Dawson Eric
Jones Garan
Kolora Sree Rohit Raj
Lim Chun Shen
Nariai Naoki
Narzisi Giuseppe
Pai Patrick
Proukakis Christos
Salit Marc
Shehreen Saadlee
Smith Graeme
Spies Noah
Watson Christopher M
Wenger Aaron M
Wolujewicz Paul
Xiao Chunlin
Zook Justin M
Publication venue: PLoS Comput Biol
Publication date: 01/06/2020
Field of study

Funder: U.S. Food and Drug Administration; funder-id: http://dx.doi.org/10.13039/100000038A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies

Lund University Publications

Directory of Open Access Journals

UCL Discovery

Apollo (Cambridge)

White Rose Research Online